Abstract: Cluster analysis is one of the primary approaches in Machine Learning. Applications involving clustering often deal with large and high dimensional datasets. Clustering of large datasets is a task having high time complexity. Clustering algorithms iterate several times before converging to the solution. A way to speed up the clustering process is Parallel Processing. Parallel programs make use of Graphical Processing Unit (GPU) and/or multi-core CPUs to reduce the computation time. There is enough scope to parallelize clustering algorithms for obtaining faster results. This paper gives a review of parallel implementations of three clustering algorithms viz. k-means, DBSCAN and Expectation-Maximization. We survey the Shared Memory and Message Passing models of parallel programming and how clustering algorithms have been performed using them. We also highlight few applications involving large data where parallel programming would be helpful.
Keywords: Machine Learning, Clustering, Clustering Algorithms, Parallel Processing, High Dimensional Data.